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Abstract 

The Gauss-Newton with approximated tensors (GNAT) method is a nonlinear model reduction method 
that operates on fully discrctizcd computational models. It achieves dimension reduction by a Petrov- 
Galerkin projection associated with residual minimization; it delivers computational efficency by a hyper- 
reduction procedure based on the 'gappy POD' technique. Originally presented in Ref. [T], where it was ap- 
plied to implicit nonlinear structural-dynamics models, this method is further developed here and applied to 
the solution of a benchmark turbulent viscous flow problem. To begin, this paper develops global state-space 
error bounds that justify the method's design and highlight its advantages in terms of minimizing compo- 
nents of these error bounds. Next, the paper introduces a 'sample mesh' concept that enables a distributed, 
computationally efficient implementation of the GNAT method in finite-volume-based computational-fluid- 
dynamics (CFD) codes. The suitability of GNAT for parameterized problems is highlighted with the solution 
of an academic problem featuring moving discontinuities. Finally, the capability of this method to reduce by 
orders of magnitude the core-hours required for large-scale CFD computations, while preserving accuracy, is 
demonstrated with the simulation of turbulent flow over the Ahmed body. For an instance of this benchmark 
problem with over 17 million degrees of freedom, GNAT outperforms several other nonlinear model-reduction 
methods, reduces the required computational resources by more than two orders of magnitude, and delivers 
a solution that differs by less than 1% from its high-dimensional counterpart. 
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1. Introduction 

Computational fluid dynamics (CFD) modeling and simulation tools have become indispensable in many 
engineering applications due to their ability to enhance the understanding of complex fluid systems, reduce 
design costs, and improve the reliability of engineering systems. Unfortunately, many desired high-fidelity 
CFD simulations are so computationally intensive that they can require unaffordable computational resources 
or time to completion, even when supercomputers with thousands of cores are available. Consequently, such 
simulations are often impractical for time-critical applications such as flow control, design optimization, 
uncertainty quantification, and system identification. 

Projection-based nonlinear model-reduction methods constitute a promising approach for bridging the 
gap betwen CFD and such applications. These methods approximate a given high-fidelity model by reducing 
its dimension, i.e., the number of equations and unknowns that describe it. For this purpose, such methods 
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first perform intensive large-scale computations 'offline' to construct a priori a low-dimensional subspace 
onto which they project the high-dimensional model of interest. This leads to a reduced-order model (ROM) 
characterized by low-dimensional operators. Then, in an 'online' stage, they exploit this ROM to compute 
approximate solutions that lie in this pre-computed subspace. 

Unfortunately, the computational cost associated with assembling the ROM's low-dimensional operators 
— matrices and vectors for implicit schemes, and vectors for explicit ones — scales with the large dimension 
of the underlying high-dimensional model. For this reason, projection-based model-reduction methods are 
efficient primarily for problems where the aforementioned operators must be constructed only once, or can 
be assembled a priori. These include linear time-invariant systems [2j [3J , linear stationary systems whose 
operators are affine functions of the input parameters [U [5] , and a class of nonlinear systems characterized 
by quadratic nonlinearities [SJ [8]. Within these contexts, projection-based model-reduction methods 
have been successfully applied to problems in structural dynamics [3J , aerodynamics [U EH Ell EH E3 and 
aeroelasticity QH E3 E3 E3 E3 ES] , amon g others. 

On the other hand, when projection is applied to linear time-varying systems, linear stationary systems 
with nonaffine parameter dependence, or general nonlinear problems, the resulting ROM is costly to assemble. 
This high cost arises from the need to evaluate the high-dimensional nonlinear function (and possibly its 
Jacobian) at each computational step of a solution algorithm. To overcome this roadblock, several approaches 
have been proposed. Such complexity-reduction techniques are sometimes referred to as hyper-reduction 
methods [2Q1 [21] . Several of these techniques are outlined below. 

The 'empirical interpolation' method developed for linear elliptic problems with non-affine parameter 
dependence [55] , as well as for nonlinear elliptic and parabolic problems [53J , reduces the computational cost 
associated with nonlinearities by interpolating the governing nonlinear function at a few spatial locations 
using an empirically derived basis. This method operates directly on the governing partial differential 
equation (PDE) and therefore at the continuous level. Its variant proposed in Ref. [24] relies for the same 
purpose on 'best [interpolation] points' and a POD basis. The hyper-reduction method proposed in Ref. [25j 
[55] extends the empirical-interpolation method to the case of a scalar conservation law discretized by an 
explicit solution algorithm. However, no further extension of this method to CFD has been performed, as 
this is not a straightforward task. 

Alternatively, the trajectory piece-wise linear (TPWL) method developed in [27] operates at the semi- 
discrete level, i.e., on the ordinary differential equation (ODE) obtained after discretizing the PDE in space. 
TPWL constructs a ROM as a weighted combination of linearized models, where each linearization point lies 
on a training trajectory. However, because the resulting ROM never queries the underlying high-dimensional 
model away from the linearization points, this method in principle lacks robustness for highly nonlinear 
problems such as those arising from a large class of CFD applications. 

Another class of hyper-reduction methods reduces computational complexity by computing only a few 
entries of the nonlinear function (and possibly its Jacobian) appearing in the governing ODE. In this paper, 
such methods are referred to as function-sampling methods; these methods also operate at the semi-discrete 
level. They include collocation approaches, which compute a small subset of the entries of the residual vector. 
Ref. [5S] proposes collocation of the nonlinear equations followed by a least-squares solution of the resulting 
overdetermined system of nonlinear equations. Similarly, Refs. [29l [20] propose a collocation of the equations 
followed by a Galerkin projection. Function-reconstruction approaches define another subset of function- 
sampling methods. In contrast to collocation methods, these techniques use the sampled entries of the 
nonlinear function to approximate the entire nonlinear function by interpolation or least-squares regression. 
One such method reconstructs the governing nonlinear function in the least-squares sense, using the same 
basis adopted to represent the state. This approach was developed in for parameter-varying systems 29H 




and for nonlinear dynamical systems discretized by explicit time-integration schemes |30j . Other function- 
reconstruction approaches include the semi-discrete analogs to the empirical and best points interpolation 
methods that have been developed for parameterized nonlinear stationary problems [3TJ [35], and for nonlinear 



^^For such problems, this method is mathematically equivalent to collocation followed by a Galerkin projection when the 
resulting overdetermined system is solved by a Galerkin projection method. 
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dynamics problems [3TJ [33J . 

Although some of the function-reconstruction methods outlined above have been successfully applied to 
large-scale steady-state problems (e.g., see Ref. [32]), it will be shown that many of them lack the robustness 
needed to address highly nonlinear dynamical systems such as those arising from unsteady CFD applications. 

The Gauss-Newton with approximated tensors (GNAT) method pQ is a nonlinear Petrov-Galerkin pro- 
jection method equipped with a function-sampling hyper-reduction scheme. In constrast with the aforemen- 
tioned nonlinear model-reduction methods, it operates at the discrete level, i.e., on the system of nonlinear 
equations arising at each time step, which is obtained after discretizing the PDE in both space and time. 
GNAT is designed around approximations that satisfy consistency and discrete-optimality conditions. As 
such, it has been successfully applied to nonlinear structural-dynamics problems [T] for which it demonstrated 
robustness, accuracy, and excellent CPU performance. 

This work further develops the GNAT methodology and demonstrates its potential for CFD applications. 
Section [2] formulates the problem of interest, and Section [3j provides an overview of GNAT. Within this 
overview, a new consistent snapshot-collection procedure is proposed in Section |3.3.2| Next, Section [3] 
presents global error bounds for the discrete fluid state in the case of the implicit backward-Euler scheme. 
This time integrator may be of little practical importance to CFD, but the developed error bounds highlight 
the merits of the principles underlying the construction of a GNAT ROM. Then, Section [5] proposes a simple 
yet effective implementation of GNAT's online stage on parallel computing platforms. This implementation 
features the concept of a 'sample mesh', which is a carefully chosen tiny subset of the original CFD mesh on 
which all online GNAT computations are performed. The sample mesh has few connectivity requirements 
and is therefore easily amenable to partitioning for parallel distributed computations. Most importantly, 
the implementation can be tailored to any specific CFD scheme or software, and to the fast computation 
of outputs such as pressure coefficients, lift, and drag. Section [6] demonstrates the potential of GNAT 
to effectively reduce the dimension and complexity of highly nonlinear CFD models while maintaining a 
high level of accuracy. In Section |6.1[ GNAT is applied to a parameterized hyperbolic problem featuring a 
moving shock; GNAT's online performance is tested for parameter values different from those used to collect 
simulation data offline. Section |6.2| demonstrates the potential of GNAT for challenging CFD applications 
with the fast solution of a benchmark turbulent flow problem with over 17 million unknowns. For this 
problem, GNAT outperforms many of the aforementioned nonlinear model-reduction methods. It reproduces 
the solution delivered by the high-dimensional CFD model with less than 1% discrepancy, while reducing 
the associated computational cost in core-hours by more than two orders of magnitude. Finally, Section [7] 
offers conclusions. 



2. Problem formulation 

2.1. Parameterized nonlinear CFD problem 

Consider the ODE resulting from semi-discretizing the conservation form of the compressible Navier- 
Stokes equations by a finite difference, finite volume, or stabilized finite clement method — possibly aug- 
mented by a turbulence model — and a given set of boundary conditions: 

dw „ , , . . 
tu(0) = w°(n). 

Let 

z = H(w(t), t i) 

= m (2) 

denote the outputs of interest that may include the lift, drag, and other quantitites. 

Here, t £ R + denotes time, w £ M. N denotes the discrete fluid state vector (i.e., the vector of discrete 
conserved fluid variables), N designates the dimension of the semi-discretization and is typically large, 
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w : K. — > K denotes the parameterized initial condition, and F : R N x R+ xff 1 -) R N is the nonlinear 
function arising from the semi-discretization of the convective and diffusive fluxes and source term (when 
present). The d input parameters, which may include shape parameters, free-stream conditions, and other 
design or analysis parameters of interest, are denoted by fj, € R d . The (feasible) input-parameter domain 
is denoted by V, and therefore \i £ V C U d . H : U N x R d ->■ W and L : (p) i-» H(w(t; fj,), fi) define two 
equivalent mappings for the output vector z £ W. 

In this work, attention is occasionally focused on a finite-volume semi-discretization method operating 
on a dual CFD mesh. Consequently, the sampling concepts discussed in Sections [3] and [5] are node/cell 
oriented. However, all concepts, algorithms, and techniques presented in this paper are easily extendible to 
finite-difference and stabilized finite-element semi-discretization methods. 

2.2. Objective: time- critical analysis 

Consider the following objective: given inputs p, € T>, compute fast approximations of the outputs 
L(fi) fa L(jl), where "fast" is defined in one of the following senses: 

1. The evaluation takes a sufficiently small amount of time. This is relevant to applications that demand 
near-real-time analysis, where the objective is to compute outputs in a time below a threshold value; 
the number of computational cores required to perform the analysis is not a primary concern. Examples 
include flow control and routine analysis, where the analyst may require an answer within a given time 
frame. 

2. The evaluation consumes a sufficiently small amount of computational resources, as measured by com- 
putational cores multiplied by time. This is relevant to many-query applications, where the objective is 
to evaluate the model at as many points in the input space as possible, given a fixed amount of wall time 
and processors. Examples include aerodynamic shape optimization and uncertainty quantification. 

When the number of degrees of freedom N of the high-dimensional CFD model is sufficiently large, solving 
the state equations ([!]) with fj, = /t and subsequently computing the outputs by Eq. ^ becomes prohibitively 
time- and resource-intensive for time-critical applications. 

Instead, the following two-stage offline-online strategy can be employed. During the offline stage, 
Eq. (fl]) is solved for e?train points in the input-parameter domain; this defines the training domain ©train = 
{a^Ij^i" C V. The data acquired during these training simulations are then used to construct a surrogate 
model that is capable of rapidly reproducing the behavior of the high-dimensional model at arbitrary points 
in T>. The online stage uses this surrogate model to perform time-critical analysis for inputs ji with ji ^ ©train 
in general. 

In contrast to surrogate-modeling techniques based on data-fit approximations (e.g., response surfaces) of 
the input-output map L(fi), model-reduction methods take a more physics-based approach. These techniques 
aim to achieve time-critical analysis by approximately solving the (physics-based) state equations for the 
online inputs ji and then computing the resulting outputs. The next section provides an overview of the 
GNAT model-reduction method developed for this purpose. 

3. Overview of the GNAT model-reduction method 

To keep this paper as self-contained as possible, this section provides an overview of the Gauss-Newton 
with approximated tensors (GNAT) nonlinear model-reduction method first proposed in Ref. [T]. 

3.1. Computational strategy and numerical properties 

Nonlinear model reduction is often performed in a somewhat ad hoc manner. As a result, nonlinear 
ROMs often lack important numerical properties. To avoid this pitfall, the design of the GNAT method 
employs a computational strategy that constructs approximations to meet conditions related to notions of 
discrete optimality and consistency. 
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The design of GNAT is based on the premise that if a given computational model is unaffordablc for a 
given time-critical application, approximations can be introduced to this model to make it more economical 
yet retain an appropriate level of accuracy. This premise leads to a hierarchy of models characterized by 
tradeoffs between accuracy and computational complexity. Then, the objective is to construct computa- 
tionally efficient approximations that introduce minimal error with respect to the previous model in the 
hierarchy. GNAT achieves this objective by relying on the concepts of discrete optimality and consistency 
introduced in Ref. pQ. Both concepts are recalled below. 

Discrete- optimal approximation: Here, an approximation is said to be discrete optimal if it leads to approxi- 
mations that — at the discrete level — minimize an error measure associated with the previous model in the 
hierarchy. This ensures that some measure of the error in the discrete approximation decreases monotonically 
as the approximation spaces expand (a property also referred to as a priori convergence). 

Consistent approximation: Here, an approximation is said to be consistent if, when implemented without 
snapshot compression, it introduces no additional error in the solution at the training inputs. 

Figure [I] depicts the model hierarchy employed by GNAT. This hierarchy consists of three computational 
models: the original tier I high-dimensional model and two increasingly approximated models referred to as 
the tier II and tier III (reduced-order) models, respectively. Each of these reduced-order models is generated 
by 1) acquiring snapshots from simulations performed for training inputs 2?train using the previous (i.e., more 
accurate) model in the hierarchy, 2) compressing the snapshots, and 3) introducing an approximation that 
exploits the compressed snapshots. 



I. 



high-dimensional model -*■ snapshot collection 



II. Petrov-Galerkin (PG) 



projection (reduce dimension) 

snapshot collection ■ 



compression 
I 



compression 
1 



hyper-reduction (reduce complexity) 



III. 



Gauss-Newton with approximated tensors (GNAT) 



Figure 1 Model hierarchy with approximations shown in red. 



3.2. Fully discrete computational framework 

As stated in the introduction, GNAT operates at the discrete level. That is, the method introduces 
approximations after ODE (JlJ has been discretized in time. Hence, the ROM constructed by GNAT is a 
low-dimensional algebraic system that governs the solution of ODE ([I]) at each time step; it cannot generally 
be expressed as a low-dimensional ODE. As such, GNAT may be less convenient than other model reduction 
methods: the ROM it produces is valid only for the time-integrator adopted for the high-dimensional model. 
However, Section [3~3| will show that the fully discrete framework enables GNAT to achieve discrete optimality. 

Throughout the remainder of this paper, it is assumed that Eq. ([I]) is solved by an implicit linear multi- 
step time integrator. In this case, if nt time steps are carried out, a sequence of n t systems of nonlinear 
equations arises. Each of these systems can be written as 

R n {w n+1 ;n) = (3) 

for n = 0, . . . , fit — 1) with outputs 

z = G(w°,...,w n *, fJ ,). (4) 

Here, a superscript n designates the value of a variable at time step n, the operators R n : M. N x R d — > M. N 
for 1% = 0, . . . , rit — 1 are nonlinear in both arguments, and G : R N x • • • x x R d — > W p . The fluid state 
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vectors w n , n — 1, . . . , n t are implicitly defined by Eq. ([3| for a given /i, and w° = w ([i) is given by the 
initial condition. 

For simplicity, consider one instance of Eq. |3| denned by one time instance and one vector of input 
parameters. Such an instance can be written as 

R{w) = 0. (5) 

Here, the fluid state vector w 6 is implicitly defined by Eq. and R : R N — > R N with w i-> R(w) is 
a nonlinear mapping. In the remainder of this paper, Eq. Q is associated with the high-dimensional CFD 
model (tier I in Figure [T]). 

3.3. Petrov-Galerkin projection 

To reduce the dimension of Eq. ([5| , the GNAT method employs a projection process. This leads to the 
tier II ROM in the model hierarchy. Specifically, GNAT seeks an approximate solution w to Eq. ^ in the 
affine trial subspace w° + W C of dimension n w <C N. Hence, w can be written as 

w = w° + $ w w r , (6) 

where Q w € M Arxn ™ is a matrix representing an n w -dimensional basis for W, and w r € M n ™ denotes the 
generalized coordinates of the fluid state vector in this basis. Note that the increment in the state w — u>° 
and not the state itself is sought in the subspace W. This is an important consideration when defining the 
basis & w as will be described in Section [3. 3.2[ 



3.3.1. Discrete optimality 

Substituting Eq. ^ into Eq. ^ yields R(w° + $ w w r ) — 0, which represents an over determined system 
of N equations in n w unknowns. Consequently, the GNAT method computes w as the solution to the 
minimization problem 

minimize ||i?(w)|| 2 . (7) 

GNAT solves this nonlinear least-squares problem by the Gauss-Newton method, which is globally convergent 
under certain assumptions. This method delivers a solution that is discrete optimal at each time step: the 
solution minimizes the discrete residual associated with the tier I model over the trial subspace. This residual- 
minimization approach is mathematically equivalent to performing a Petrov-Galerkin projection with a test 
OR 

basis corresponding to — — $ m (see Ref. IT]). Note that the fully discrete framework described in Section 3.2 

aw I 
enables discrete optimality to be achieved: the test basis depends on the discrete residual and thus depends 

on the time integrator, so it is not defined at the semi-discrete level. 

Ref. [T] numerically demonstrated the superior accuracy delivered by the tier II Petrov-Galerkin ROM 

associated with Eq. Q compared with its counterpart based on a (more commonly used) Galerkin projection 

when applied to a non-self-adjoint problem characterized by an unsymmetric residual Jacobian, which is 

typical in CFD. This strong performance is likely due to the discrete optimality property, which Galerkin 

projection lacks for such problems. However, for the relatively small class of CFD problems characterized 

by a symmetric residual Jacobian, a Galerkin projection is also discrete optimal, as it minimizes the discrete 

residual, albeit for a different norm pQ. 



3.3.2. Consistency 

To ensure a consistent projection, the basis <& w can be computed by proper orthogonal decomposition 
(POD) using a specific set of snapshots collected from simulations performed using the tier I CFD model 
at the training inputs. Specifically, given a snapshot matrix W € ^ Nxn w ; a POD basis $ G jj^xn* Q f 
dimension n$ < nw is obtained by first computing the (thin) singular value decomposition (SVD) 

W = UY,V T , (8) 



G 



where the superscript T designates the transpose, the left-singular- vector matrix U = [u 1 ■ ■ ■ u nw ] £ 
R Nxnw satisfies U T U — I, the singular-value matrix £ = diag(cri) satisfies o\ > a% > ■ ■ ■ > a nw > 0, 
and the right-singular- vector matrix V £ M. nwXnw satisfies V T V = I. Then, the sought-after POD basis 
is obtained by selecting the first n$ < nw left singular vectors: $ = [it 1 •■■ u n *l. As a result, $ has 
orthonormal columns and satisfies $ T $ = I, Often, n$ is determined from an energy criterion such that the 
POD basis captures a fraction of the statistical energy of the snapshots. 

Ref. PQ proved that the Petrov-Galerkin projection defined by Eq. ¥f\ is consistent when Q w is computed 
by POD with snapshots of the form {w n (fi) — w n (°\(i) \ n = 1, . . . ,nt, fi £ ©train}, where io n W = w n ~ x , 
n = 1, . . . , n t denotes the initial guess for the Newton solver. This implies that the snapshots used for POD 
should correspond to the solution increment at each time step of the training simulations. This remark is 
noteworthy because most nonlinear model-reduction techniques reported in the literature employ a POD 
basis computed using snapshots {u; n (/Lt) | n = 0, ...,n tl fj, £ ©train}) which do not lead to a consistent 
projection. 

Here, an alternative snapshot-collection procedure is proposed: {u>"(/i) — w°(ii) | n = I, . . . ,n t , n £ 



©train}- These snapshots also lead to a consistent projection under certain conditions. Appendix A discusses 
these conditions and contains the proof that both snapshot-collection procedures lead to a consistent Petrov- 
Galerkin projection. 

3.4- Hyper-reduction 

Solving the least-squares problem Q by the Gauss-Newton method leads to the following iterations: for 
k = 1, . . . , K, solve the linear least-squares problem 



s^=arg min || J^$ w a + R^\\ 2 (9) 

and set 

(10) 

where K is determined by the satisfaction of a convergence criterion, wf^ is the initial guess (often taken 
to be the generalized coordinates computed at the previous time step), and = R + Jmii^'J and 
OR / \ 

= — — (w° + are the nonlinear residual and its Jacobian at iteration fc, respectively. The step 

dw V / 

length cS k ^ is computed by executing a line search in the direction to ensure convergence, or is set to the 
canonical step length of unity. Even though the dimension of the trial subspace is small, the computational 
cost of solving the above nonlinear least-squares problem scales with the dimension N of the tier I high- 
dimensional CFD model. As mentioned in the introduction, this is the computational bottleneck faced 
by many (if not all) projection-based nonlinear model reduction techniques. The role of hyper-reduction 
(referred to as system approximation in Ref. [1]) is to decrease this computational cost. 

3.4.1. Optimality 

To address the performance bottleneck identified above, GNAT employs the gappy POD data recon- 
struction technique [31]. In the context of GNAT, gappy POD leads to approximations of the one- and 
two-dimensional tensors R( k > and J^ k ^ w , respectively, by computing only a small subset of their rows. 
Denoting by 1 = {ii, h, ■ ■ ■ , i ni } C {1, . . . ,N} the set of Hi sample indices for which these functions are 
evaluated, the sample matrix is defined as 

Z = Z(X) £ R Nxn % (11) 

where 

^{y)= [e yi •■■ e y J T 

ysK-yJ, (12) 
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and ei is the «th canonical unit vector. 

Given these sample indices and bases £ 
j( k '$ w via gappy POD as follows: 



vNxi 



• R and $j € E A,xnj , GNAT approximates #<» and 



[Z$ R ] 

$j [Z*j]" 



(13) 
(14) 



where the superscript + designates the Moore-Penrose pseudo-inverse. Approximations ( 13 1— ( 14 ) are dis- 
crete optimal in the sense that the error measures \\ZR^ - ZR^\\ 2 and \\ZjW>$ w ~ ZJW$ W \\ F monoton- 
ically decrease as columns are added to $7? and respectively. 

Substituting RW = R,W and J^$ w = JW$ W in Eqs. (9 )-( 10 1 and assuming that = I nj — which 

is easily achievable by computing <&j using POD — the tier II Petrov-Galerkin iterations are transformed 
into the tier III GNAT iterations 



3 (fc) 



arg min \\AZJ {k) <S> n 



BZRW\\ 2 (15) 

(16) 

XTli can be computed a priori, 



Here, the matrices A = [Z$ 7 ] + e M. n -' xn * and B = [2"$ fl J + e R n - 

during the offline stage. 

Note that in CFD, the Jacobian matrix is usually sparse. Hence, computing ZRy^> and Zj( k ^$ w does not 
require access to all entries of the fluid state vector. For this reason, let J denote the minimum-cardinality 
set of indices of the fluid state vector that influences the residual entries corresponding to sample-index set 
X. GNAT requires access to only the 'masked' state Z 1 Zw, where Z = Z (J) and nj = \J\. The algebraic 
products implied by this masked state vector can be obtained by evaluating only the J entries of the state 
w. 

After the index sets I and J are determined (see Section 5.2 for a discussion on determining these index 
sets), the online GNAT iterations executed at each time step can proceed as follows: 

1. Compute Zw^ = Zw° + Z<$> w wi , which requires updating only n s entries of the state. 

dR / - - \ — — 

2. Compute C (k) = Z— [Z 1 ' Zw {k) J Z T Z<$> W and £)W = ZR(Z T Zw^), which necessitates computing 
only Hi rows of R^ and JW$ W , respectively. 



Compute the low-dimensional products AC^ and BD^ k \ 



Solve the reduced-order least-squares problem s^ k ' 



5. Update the n w generalized coordinates M)f +1 ' using Eq. (16 1 



arg mm 



\ACWv + BDW 



Because none of the above computations scales with the large dimension N of the high-dimensional CFD 
model, the cost of the online stage of GNAT is typically very small. 



3.4.2. Consistency 

To ensure consistency in the hyper-reduction procedure outlined above, the bases $>r and <&j can be 
constructed using POD with snapshots that verify certain properties. For example, Ref. pQ introduced 
three conditions on the snapshots that together ensure consistency. These conditions lead to a hierarchy of 
snapshot-collection procedures that trade consistency for more affordable offline resources such as core-hours 
and storage. Table [l] summarizes these procedures. 

Procedure 3 ensures a consistent hyper-reduction because it satisfies all three conditions that together 
are sufficient for consistency. However, it is infeasible for most problems as it requires storing either n w + 1 
vectors or the (sparse) high-dimensional residual Jacobian at each Newton step of the training simulations. 
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I 


2 
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Snapshots for 




R u 






Snapshots for j( k ^$ w 


R^ 


R (k) 






# of simulations per training input 


1 


2 


2 


2 


# of snapshots per Newton iteration 


1 


1 


2 


+ 1 


# of conditions for consistency satisfied 





1 


2 


3 



Table 1 Snapshot-collection procedures for the tier III GNAT ROM. The indicated snapshots are saved at each 
Newton iteration for the training simulations. Subscripts I and II specify the tier of the model for which the 
snapshots are collected. 



Procedure 2 does not provide a consistent hyper-reduction, although it satisfies two of the three con- 
sistency conditions. Furthermore, it is computationally feasible as it requires saving only two vectors per 
Newton step. 

Procedure 1 is more economical than procedure 2, as it requires saving only one vector per Newton 
iteration and it computes one fewer POD basis. Further, procedure 1 uses the same POD basis for the residual 
and its Jacobian; when this occurs, the GNAT ite rations are equ ivalent to the Gauss-Newton iterations for 



minimizing which can abet convergence (see |Appendix BJ . However, procedure 1 satisfies only one of 
the three consistency conditions. 

Procedure requires performing only one tier I simulation for each training input and therefore is similar 
to conventional approaches for collecting snapshots [251 1221 [Ml [32 however, it satisfies none of the 
aforementioned consistency conditions. |Appendix B| offers an additional discussion of these snapshot- 
collection procedures. 

3. 5. Computation of outputs 

After GNAT computes generalized coordinates u>™, n = 1, . . . , n t for online inputs fi £ T>, the outputs 
z can be computed. Because the objective is to ensure that no online computation scales with the large 
dimension N of the high-dimensional CFD model, an alternative method to computing the outputs via Eq. 



(17) below is needed: 

z = G(w Q ,w Q + § a w*, ...,w° + (17) 

Indeed, this expression implies matrix-vector products of the form $ w y that entail 0(Nn w ) operations. 

In general, the outputs cannot be computed during the online stage of a GNAT simulation, because the 
outputs may depend on entries of the state vector that are not included in J . For example, the drag force 
exerted on an immersed body depends on the conserved fluid variables at all nodes located on its wet surface, 
but the index set J will not generally include all of these indices. 

Instead, the outputs can be efficiently computed in a post-processing step that accesses only the computed 
generalized coordinates w" for n = 1, . . . ,n t , and the rows of the initial condition w° and POD basis & w 
needed for the desired output computation. To this effect, let K, denote the minimum-cardinality set of indices 
of the fluid state vector that affects the output computation. Given generalized coordinates computed by 
GNAT online, the outputs can be computed as 

z = G(Z T Zw°,Z T Zw a + Z T Z<f> w wl, Z T Zw° + Z 7 'Z^ w w^ , fl), (18) 

where Z = Z (]C) and ni< = \ JC\. This approach entails products of the form Z? Z§ w y that require performing 
computations with only the K, rows of <& w and incur 0(n^n w ) operations. This operation count is small if 
Uk <C N. Fortunately, this condition holds in the case of spatially local outputs such as the value of flow 
variables at several points in the domain, or the lift and drag, which are associated with the wet surface of 
a body. This condition does not hold for spatially global outputs. 
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3.6. Offline-online decomposition 

In summary, GNAT builds a 'global' ROM — that is, a ROM trained at multiple points in the input- 
parameter space — and performs a ROM simulation in two stages as follows: 

Offline stage 

1. Perform tier I simulations at various training inputs 2?train- Collect snapshots of the fluid state vector 
(and snapshots of the residual if using snapshot-collection procedure of Table[T]) . during these training 
simulations. 

2. Compute POD basis $> w using the collected fluid-state snapshots. 

3. If snapshot-collection procedure 1, 2, or 3 is employed, perform tier II simulations at training inputs 
Strain- Collect snapshots for the residual and its Jacobian during these training simulations as specified 
by Table [Tj. 

4. Compute POD bases and $j using the collected snapshots. To ensure the matrix AZj( k '$> w has 
full rank, enforce nj > n w . 



5. Determine the sample-index set I (for example, see Section 5.2), and consequently index set J . To 
ensure uniqueness for the gappy POD approximations, enforce > ur and n.; > nj. 

6. Compute matrices A = and B = Q^Qr [Z$r] + to be used during online computations. 

7. Determine the index set K, related to output computation. 

8. Using index sets X, J ', and K, construct the associated sample meshes (see Section[5]) — which are tiny 
subsets of the CFD mesh — on which to perform the online stage of a GNAT simulation as summarized 
below. 

Online stage 

1. Apply Algorithm [T] to perform the GNAT ROM simulation for inputs p, G T> specified online. 

2. Apply Algorithm [2] to compute the desired outputs. 



4. Error bounds 

Here, error bounds are developed for any discrete nonlinear model-reduction method assuming that time 
discretization is performed using the backward-Euler scheme. These bounds highlight the advantages of the 
GNAT method, as it minimizes components of these error bounds. 

When Eq. ([I]) is time discretized using the backward-Euler scheme, the residual corresponding to time 
step n, input parameters /i, and the sequence of states computed by the high-dimensional CFD model u>™, 
n = 0, . . . , tit can be written as 

R n (w n+1 ; t i) = w n+1 - w n - AtF(w n+1 ,t n+1 ; fi). (19) 



Proposition 4.1. Assume f : (u>, t; fi) t— > w—AtF(w, t; /i) satisfies the following inverse Lipschitz continuity 
condition for the online input p, £ T> 

\\f(w,t n -jl)~f(y,t n ;il)\\ 

ii n > e > 0, Vne{l,...,n t }. (20) 
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Algorithm 1 Online step [T] GNAT ROM simulation 

Input: online matrices A and B, online inputs ji € T>, initial condition Zw°(fl), and state POD basis Z<1> 
Output: generalized coordinates w™(fi), n = 1, . . . , nt 
1: for n = 0, . . . , rif — 1 do 

2: Choose initial guess uir +l (e.g., u>™ +1< -°' «— iu™ with = 0). 
3: k <- 

4: while not converged do 
5: Compute 

C n(k) = I 2 t 2w o + z T Z<P w w? +1 W;fr) Z T Z<$> W 

D n(k) = ZR n ^z T Zw° + Z T Z$ w w? +1( -V;if) . 

6: Compute 

s n+i(k) _ min \\AC n( - k) v + BD n ^\\ 2 

w n+l(fe+l) = +a n+l(fc) s n+l(fe) ) 

where a n+1 ^ is computed via line search search or is set to 1. 

7: Zw n+1( - k +^ <- Zw° + Z$> w W? +1[k+1) 

8: fc <- k + 1 

9: end while 

10: Set <— u'" +1( ' fc ' ) and save it; it will be used to compute outputs using Algorithm^ 

11: end for 
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Algorithm 2 Online step [2j computation of outputs 

Input: online inputs jx € T>, generalized coordinates io"(/t), n = l,...,n t , initial condition Zw {jx), and 

state POD basis Z$ w 
Output: outputs z 

for n = 1, . . . , nt do 

Zw n <r- Zw° + Z<Z> W W? 

end for 

Compute z = G{Z T Zw°, . . . , Z T Zw nt , #) 



Furthermore, assume that the high- dimensional CFD model employs the backward-Euler scheme for time- 
integration and computes states w n , n = 1, . . . , n t that satisfy an absolute tolerance for the residual 

\\R n (w n+1 ;fx)\\<e mwton , Vne{0,...,ra t -1}. (21) 

Then, for any sequence of states w n , n — 0, . . . , n t satisfying w° = w° , a global error bound for the 
approximation of the state at the n-th time step is given by 



< ^2 akb n-k < ^2 flfec «-fc - X! akdn - 

k=l k=l k = l 



(22) 



where 



\\ w - 2/ II 

sup sup 



n£{i,...,n t } w^ y \\f(w, t n ;fi)- f(y, t n ;fi)\\ 
b n = e Ncwtou + \\R n (w n+1 ;ji)\\ 

Cn = CNcwton + ||Pi? n (*" +1 ;/i)|| + ||(/- P)R n {w n+1 -Ji)\\ (23) 

d n = CNewton + \\PRT \ ji) \\ + H^HH (/ - P) & (w n+1 ; fi) \\ 

p = <p r {z<p r } + z 

p = 

Z$ R = QR, 

where Q £ IT**™*, R e R««x»« ; a nd R n (w;n) = w - w n - AtF(w, t n ; fj). 

| Appendix C| provides a proof of the above error bounds. Their consequences include: 

• Justification for the minimum-residual approach taken by the tier II Petrov-Galerkin ROM. Namely, by 
computing w n+1 = arg min lli^fui; /t)||, the tier II Petrov-Galerkin ROM selects the element 

of the trial subspace that minimizes b ni n — 1, . . . , n t . This in turn minimizes the tightest error bound 
in ([22]). 

• Justification for using & R = $j in GNAT (snapshot-collection procedures and 1). In this case, the 
GNAT iterations are equivalent to applying the Gauss-Newton method for minimizing |j PR n (w n+1 ;ft)\\. 
As a result, GNAT computes w n+1 = arg min \\PR n (w; which is the element of the trial 

subspace that minimizes the second term of both c„ and d n , n = 1, . . . , nt- 

• Justification for computing & R via POD. When computed by POD, the basis is the orthogonal 
basis of dimension n R that minimizes the average projection error over the set of residual snapshots; 
this projection error appears as the last term of d n . 
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The tightest bound in ( 22 ) is computable by the tier II Petrov-Galerkin ROM if the Lipschitz constant 
a can be computed or estimated. This is due to the computability of b n : it requires only the tolerance 
ENewton and the residual norm at each time step. 

n n 

The bound ak Cn-k (resp. a k d n -k) is computable by GNAT if the Lipschitz constant a can be 



fe=i 



fe=i 



computed or estimated and the projection error P)R n (w n+1 ; (resp. || (I— V)R n (w n+1 ; jl)\\) can 



be computed or estimated. This is due to the computability of \\PR n (w n+1 ; 
The projection error ||(/ — P)R n (w n+1 ] fi)\\ can be estimated by 



\(I-P)R n (w 



n ( ~ n+1 . 



\(& R [Z& R ] + -$ R [Z$j 



[ZSrY ZR n {w n+1 ;P)\\ 
ZR n (w n+1 ;jl)\\ (24) 







[Z^rV 



ZiT(u;™ +i ;/i) 



where 



for some n' R > ur (following Ref. [33] ) . Alternatively, the projection error 

| (I — F)R n (w n+1 ; p,)\\ can be approximated by the sum of the squares of the singular values neglected 
by <f> R (following Ref. HO). 



5. Implementation of online computations and post-processing 

5.1. Sample mesh concept 

A quick inspection of Algorithm [T] reveals that the online stage of the GNAT method performs CFD 
computations in only Step [5] of this algorithm. Furthermore, these computations require, manipulate, and 
generate information only at the nodes of the CFD mesh associated with the index sets I and J described 
in Section 3.4.1 Similarly, an inspection of Algorithm [2] for output computation reveals that this algorithm 



requires online access only to information pertaining to the nodes of the CFD mesh associated with the 
index set K. described in Section |3.5| From these two observations, it follows that both online algorithms 
can be effectively implemented by constructing in each case a 'sample mesh' tailored to the computations to 
be performed. This concept is related to the subgrid idea recently proposed in [231 131)] , but differs from it 



primarily in the node sampling algorithm as discussed in Section 5.2 

For the purpose of building and exploiting the GNAT ROM, the sample mesh used to execute Algorithm 
[I] must contain only the nodes associated with index sets I and J, and the geometrical entities (e.g., 
edges, faces, cells) associated with these nodes. For example, consider the case where the flow solver of 
interest operates on unstructured tetrahedral meshes and is based on a second-order finite-volume spatial 
discretization, where the discrete unknowns are located at the mesh nodes and the fluxes are computed across 
the boundaries of control volumes. Here, the control volumes (or dual cells) are constructed by connecting 
the centroids of the triangular faces and the midpoints of the edges. The union of these control volumes 
is often referred to as the dual CFD mesh. In this case, the sample mesh is constructed by assembling the 
nodes associated with the index sets I and J ', and the edges, faces, and cells required to allow the same 
CFD solver to perform the computations in Step [5] of Algorithm [l] as if it were operating on the original 
CFD mesh. Figure [2] illustrates this case in two dimensions. Note that although the sample mesh shown in 
this figure is simply connected, this property is not required. The reason for this is that the sample mesh 
has no specific geometrical meaning and therefore no connectivity requirement aside from that imposed by 
the stencil of the flow solver's spatial discretization scheme. 

The second sample mesh used to execute Algorithm [2] for output computation must contain only the 
nodes associated with the index set /C, and their corresponding geometrical entities. For example, when the 
desired outputs are the lift and drag, the sample mesh is the wet surface mesh — that is, the collection of 
faces and nodes lying on the obstacle around which the flow is computed — and any other geometrical entity 
the flow solver requires to compute the outputs. 

As previously mentioned, one major advantage of the sample-mesh approach is that it allows the same flow 
solver that was used for offline, high-dimensional CFD computations to be used for online, low-dimensional 
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Figure 2 Original and sample CFD meshes for online GNAT computations based on a second-order finite 
volume method operating on a dual mesh. The original mesh is defined by all triangles, control volumes 
(dashed lines), edges, and nodes. The sample mesh is defined by the gray-shaded triangles, associated edges, 
all control volumes fully contained within the gray region, and the red, blue, and green nodes. The residual 
and its Jacobian are computed at the red node (which defines I) , and the fluid state vector is computed at 
the red, blue, and green nodes (which together define J\ 



computations. Consequently, the online ROM computations can be automatically distributed and paral- 
lelized in the same manner as their large-scale CFD counterparts. The ROM's performance can also be 
expected to scale (in the weak sense, or the scaled speedup metric) in a similar manner to that of the large- 
scale flow computations using the same CFD solver. However, because its size is typically a tiny fraction 
of that of the original CFD mesh, the sample mesh can be expected to require significantly fewer computa- 
tional cores and lead to simulations requiring far fewer core-hours than its large-scale counterpart; Section 
16.21 demonstrates this. 

Determining the index set K. is a trivial task. For this reason, constructing the sample mesh for output 
computation is also straightforward. However, determining the sample-index set I at which to compute the 
residual and its Jacobian is a more delicate matter that is discussed next. 



5.2. Underlying node sampling algorithm 

Several algorithms have been proposed in the literature for selecting the sample indices that define the 
sample matrix Z . Usually, these algorithms are tailored to the hyper-reduction procedure they are designed 
to support. For example, for the various forms of empirical interpolation outlined in Section [T] several 
algorithms for selecting sample indices have been developed around the objective of minimizing the error in 
the interpolated snapshots [22, 32(55], the difference between the interpolated snapshots and their orthogonal 
projections onto the subspace of approximation |37l 132) . and the condition number of the normal-equations 
matrix used for interpolation or least-squares approximation [38 , 29 . However, because these algorithms are 
sensitive to differences in scale between different conservation equations, they are not particularly suitable for 
CFD applications, as they would lead to a biased treatment of the multiple conservation equations defined 
at each mesh node. For this reason, Ref. [33] applied the greedy sampling algorithm adopted in Refs. [551 
1311 125] to each conservation equation separately. However, this approach causes the conservation equations 
to be sampled at different sets of nodes. This not only complicates the implementation of online CFD 
computations, it also leads to a larger subgrid than necessary and therefore to computationally suboptimal 
nonlinear ROM simulations. 

Here, Algorithm [3] is proposed for determing the sample nodes from a given CFD mesh, and therefore 
constructing the sample-index set X and sample matrix Z . This algorithm is based on the greedy method 
presented in Refs. [55J 132 155] . This choice is made because this method attempts to minimize the error 
|| (I— P)R n (w n+1 \ p)\\ associated with the gappy POD projection of the residual, and therefore the third term 
of the coefficient c n (23) characterizing the error bound (22). However, Algorithm [3] distinguishes itself from 
the method described in Refs. [HI [32 [55J in a few noteworthy aspects. First, it allows for overdetermined 



14 



least-squares matrices as opposed to relying on interpolation of the nonlinear function. Secondly, it allows 
different bases to be used to approximate the residual and its Jacobian. Finally, it operates directly on the 
mesh nodes instead of the algebraic indices. Thus, the sample-index set 2" consists of the degrees of freedom 
associated with the nodes in set Af = {r^, n 2 , . . . , n n J. Because of this latter minor albeit distinctive feature, 
Algorithm [3] treats all conservation equations in a balanced manner, does not lead to a larger-than-necessary 
sample mesh, and therefore does not introduce from the outset a computational inefficiency in the online 
ROM computations. 

Remark 2 The sample-node set Af can be seeded with nodes of the CFD mesh that are deemed important 
due to their strategic locations. In particular, it is essential that at least one sample node lies on the inlet 
or outlet boundary of the problem, if such a boundary exists. It is equally essential that each input variable 
Hi, i — 1, . . . , d affects the value of the residual at at least one sample node. If the above conditions are not 
met, the hyper-reduced GNAT model will be blind to the boundary conditions and inputs [TO] . 



Algorithm 3 Greedy algorithm for selecting sample nodes from a given CFD mesh 

Input: <f>fl; <i>j; target number of sample nodes n s ; seeded sample-node set Af (see Remark 2); number 
of working columns of $^ and $j denoted by n c < min(n^, nj, vn s ), where v denotes the number of 
unknowns at a node (for example, v = 5 for three-dimensional compressible flows without a turbulence 
model) . 
Output: sample-node set M 

Compute the additional number of nodes to sample: n a = n s — \Af\ 
Initialize counter for the number of working basis vectors used: rib <— 
Set the number of greedy iterations to perform: na — min(n c ,n a ) 

Compute the maximum number of right-hand sides in the least-squares problems: tirhs = ccil(?i, c /rt a ) 
Compute the minimum number of working basis vectors per iteration: n C i. m i n = floor(n c /nit) 
Compute the minimum number of sample nodes to add per iteration: n ai min = floor(n a nRHs/ n c) 
for i = 1, . . . , nit do {greedy iteration loop} 

Compute the number of working basis vectors for this iteration: n C i <— n C i jm i n ; 
if (i < n c mod nit), then n C i <— n C i + 1 

Compute the number of sample nodes to add during this iteration: n a i -s— n a i. m j n ; 
if (riRHS = 1) and (i < n a mod n c ), then n ai 4- n al + 1 
if i = 1 then 



[J 1 

else 

for q = 1 

end for 
end if 

for j = l,. 



do {basis vector loop} 



b n b+q 



4 



6^] a, with a 



b n j b \ /3, with /3 



= arg mm 

7GR"t> 

arg min 

7GR™!) 



z<p?h- 



- z<t> R b+q 

Z(f> n j b+q 



do {sample node loop} 



Choose node with largest average error: n 4— arg max Y] 



E ((Rf) 2 




where 6(1) denotes the degrees of freedom associated with node I. 

Af <— Nil {n} 
end for 
n b i- n b + n ci 
end for 
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Table 2 Offline and online inputs for the IBVP ( 25 HI 27 1 



Input variables 


Training input #1 


Training input ^2 


Training input #3 


Online input 


M 1 




M 3 




a 


3 


6 


9 


4.5 


b 


0.02 


0.05 


0.075 


0.038 



6. Applications 

To illustrate the ability of GNAT to reduce the dimension and complexity of highly nonlinear CFD models 
while maintaining a high level of accuracy, this section considers two examples. The first one is an academic 
problem based on Burgers' equation. It features a moving shock, and therefore highlights GNAT's potential 
for unsteady CFD problems with moving discontinuities. In this one-dimensional example, GNAT is applied 
in a prediction scenario — that is, for the (most relevant) case where the values of the input variables 
change between the offline training simulations and the online simulation. The second example pertains to 
the computation of the Ahmed body wake flow |41j . which is a well-known CFD benchmark problem in 
the automotive industry. The CFD model employed for this three-dimensional problem is characterized by 
millions of unknowns and therefore incurs time-consuming offline computations. For this reason, GNAT is 
applied in this example in reproduction mode only — that is, for the (preliminary) scenario where the online 
input-variable values are identical to their training counterparts. Nevertheless, this example demonstrates 
GNAT's performance on a realistic, large-scale turbulent flow problem, and contrasts it with that of other 
nonlinear model-reduction methods. 

6.1. Parameterized inviscid Burgers' equation 

This numerical experiment employs the problem setup described in Ref. |27j . Consider the parameterized 
initial boundary value problem (IBVP) 



dU(x,t) ld(U 2 {x,t)) 



0.02e 6a; (25) 



dt 2 dx 

U(0,t) = a, Vf > (26) 
U(x,0) = 1, Vx G [0, 100] , (27) 

where a and b are two real- valued input variables. This problem is discretized using Godunov's scheme, which 
leads to a finite- volume formulation. The one-dimensional domain is discretized using a grid with 4001 nodes 
corresponding to coordinates coordinates X4 — i x (100/4000), i — 0, . . . ,4000. Hence, the resulting CFD 
model is of dimension N = 4000. The solution U(x,t) is computed in the time interval t € [0, 4000] using 
a uniform computational time-step size At = 0.05, leading to n t = 1000 total time steps. Because there is 
only one unknown per node, each sample node corresponds to a single sample index. 

First, a GNAT model is constructed using snapshot-collection procedure 2 (see Table [TJ and the following 



parameters: n w — 50, tir = 160, nj = 70 and rii = 160. It is trained for the solution of the IBVP ( 25 1— ( 27 ) 
using the the values of the boundary-condition parameter a and source-term parameter b reported in columns 
2-4 of Table H 



Next, the resulting global GNAT ROM is applied online to the solution of the IBVP (|25j)-[27|) configured 
with the new values of a and b shown in column 5 of Table [2] A reference solution for this problem is also 
computed using the high-dimensional CFD model. Both solutions are graphically depicted in Figure [3] and 
were computed using a single processor. 

The reader can observe that the GNAT prediction closely matches the reference solution in general. 
Although oscillations in the GNAT solution are apparent at t = 2.5, they dissipate over time. The relative 
time-averaged discrepancy between the GNAT solution and the reference high-dimensional CFD solution as 
measured in the Euclidean norm of the state vector is only 1.26%. The high-dimensional CFD solution took 
1167 times longer to complete than the online GNAT solution; this showcases the improved CPU performance 
delivered by the GNAT ROM. 
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■ high-dimensional model, t = 2.5 

■ GNAT, t = 2.5 
high-dimensional model, t = 20 
GNAT, t = 20 

■ high-dimensional model, t = 45 
GNAT, i = 45 
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Figure 3 Performance of GNAT in predictive mode for the IB VP ( 25 H 27 1 



6.2. Ahmed-body wake flow 
6.2.1. Preliminaries 

To assess GNAT's performance on large-scale CFD applications, GNAT is implemented in the massively 
parallel compressible-flow solver AERO-F [351 E3]- F° r turbulent, viscous flow computations, this finite- 
volume CFD code offers various RANS and LES turbulence models, as well as a wall function. It performs a 
second-order semi-discretization of the convective fluxes using a method based on the Roe, HLLE, or HLLC 
upwind scheme. It can also perform second- and fourth-order explicit and implicit temporal discretizations 
using a variety of time integrators. The GNAT implementation in AERO-F is characterized by the sample- 
mesh concept described in Section [5j All linear least-squares problems and singular value decompositions 
are computed in parallel using the ScaLAPACK library [33]. AERO-F is used here to demonstrate GNAT's 
potential when applied to a realistic, large-scale, nonlinear benchmark CFD problem: turbulent flow around 
the Ahmed body. 

The Ahmed-body geometry |41j is a simplied car geometry. It can be described as a modified par- 
allelepiped featuring round corners at the front end and a slanted face at the rear end (see Figure |3|. 
Depending on the inclination of this face, different flow characteristics and wake structure may be observed. 
For a slant angle <p > 30°, the flow features a large detachment. For smaller slant angles, the flow reattaches 
on the slant. Consequently, the drag coefficient suddenly decreases when the slant angle is increased beyond 
its critical value of ip = 30°. Due to this phenomenon, predicting the flow past the Ahmed body for varying 
slant angles has become a popular benchmark in the automotive industry. 

This work considers the subcritical angle w. — 20° and treats the drag coefficient Cn = t r ,„ , — 5 

° ~ ° ^PooV^S. 6016x10^^ 

around the body as the output of interest. The free-stream velocity is set to Voo = 60 m/s, and the Reynolds 
number based on a reference length of 1.0 m is set to Re = 4.29 x 10 6 . The free-stream angle of attack is set 
to 0°. 



6.2.2. High- dimensional CFD model 

The high-dimensional CFD model corresponds to an unsteady Navier-Stokes simulation using AERO-F 's 
DES turbulence model and wall function. The fluid domain is discretized by a mesh with 2,890,434 nodes and 
17,017,090 tetrahedra (Figure [5]). A symmetry plane is employed to exploit the symmetry of the body about 
the x-z plane. Due to the turbulence model and three-dimensional domain, the number of conservation 
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Figure 4 Geometry of the Ahmed body (from Ref. [45]) 



equations per node is v — 6, and therefore the dimension of the CFD model is N — 17, 342, 604. Roe's 
scheme is employed to discretize the convective fluxes; a linear variation of the solution is assumed within 
each control volume, which leads to a second-order space-accurate scheme. 




Figure 5 CFD mesh with 2,890,434 grid points and 17,017,090 tetrahedra (partial view, ip = 20°). Darker 
areas indicate a more refined area of the mesh. 

Flow simulations are performed within a time interval t £ [0 s, 0.1 s], the second-order accurate implicit 
three-point backward difference scheme is used for time integration, and the computational time-step size is 
fixed to At = 8 x 10~ 5 s. For the chosen CFD mesh, this time-step size corresponds to a maximum CFL 
number of roughly 2000. The nonlinear system of algebraic equations arising at each time step is solved by 
Newton's method. Convergence is declared at the fc-th iteration for the n-th time step when the residual 
satisfies ||i?™( fc )|| < 0.001||i?™^ ^||. All flow computations are performed in a non-dimensional setting. 

A steady-state simulation computes the initial condition for the unsteady simulation. This steady-state 
calculation is characterized by the same parameters as above, except that it employs local time stepping 
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Figure 6 Time history of the drag coefficient predicted for <p = 20° using DES and a CFD mesh with TV = 
17,342,604 unknowns. Oscillatory behavior due to vortex shedding is apparent. 

with a maximum CFL number of 50, it uses the first-order implicit backward Euler time integration scheme, 
and it employs only one Newton iteration per (pseudo) time step. 

All computations are performed in double-precision arithmetic on a parallel Linux cluster^] using a variable 
number of cores. 

6.2.3. Comparison with experiment 

Ref. [41] reports an experimental drag coefficient of 0.250 around the Ahmed body for a slant angle of 
(p = 20°. Figure [6] reports the time history of the drag coefficient computed using the high-dimensional CFD 
model described in the previous section. Indeed, the time-averaged value of the computed drag coefficient 
obtained using the trapezoidal rule is Cd = 0.2524. Hence, it is within less than 1% of the reported 
experimental value. This asserts the quality of the constructed CFD model and AERO-F's computations. 
For reference, this high-dimensional CFD simulation consumed 13.28 hours on 512 cores. 

6.2.4- ROM performance metrics 

The following metrics will be used to assess GNAT's performance. The relative discrepancy in the drag 
coefficient, which assesses the accuracy of a GNAT simulation, is measured as follows: 

= FT~ra '■ — TTni (28) 

max C D i - mm Co™ 

n n 

where Cjy\ denotes the drag coefficient computed at the n-th time step using the high-dimensional CFD 
model (tier I model), and Cjy\\\ denotes the corresponding value computed using the GNAT ROM (tier III 
model) . 



2 The cluster contains compute nodes with 16 GB of memory. Each node consists of two quad-core Intel Xcon E5345 
processors running at 2.33 GHz inside a DELL Poweredge 1950. The interconnect is Cisco DDR, InfiniBand. 
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The improvement in CPU performance delivered by GNAT as measured in wall time is denned as 



WT = ^-, (29) 
J- in 

where Tj denotes the wall time consumed by a flow simulation associated with the high-dimensional CFD 
model, and T\u denotes the wall time consumed online by its counterpart based on a GNAT ROM. For 
the high-dimensional model, the reported wall time includes the solution of the governing equations and 
the output of the state vector; for the GNAT reduced-order model, it includes the execution of Algorithm 
[T] After the completion of Algorithm [T] Algorithm [2] is executed to compute the drag coefficient. This 
output-computation step employs a sample mesh based on nodes K. determined from the wet surface; it is 
characterized by 124,047 nodes and 492,445 tetrahedral cells. For all reduced-order models, Algorithm [2] 
consumed 12.2 minutes on 4 cores, or 9.7 minutes on 8 cores. 

The improvement in CPU performance delivered by GNAT as measured in computational resources is 
defined as 

CR=^p-, (30) 
cm J in 

where c\ and cm denote the number of cores allocated to the high-dimensional and GNAT-ROM simulations, 
respectively. 

As reported in Section [6. 2.3\ the high-dimensional CFD simulation is characterized by Tj = 13.28 hours 
and ci = 512 cores, which leads to c\T\ = 6, 798 core-hours. 

6.2.5. GNAT performance assessment 

This section assesses GNAT's performance for two different snapshot-collection procedures: procedures 
and 1 of Table [T] Recall from Section |3.4.2| that snapshot-collection procedure is inconsistent in the 



sense introduced in Ref. d and restated in Section 3.1 but is similar to the approach most often taken in 



the literature. Procedure 1 satisfies one consistency condition. Procedure 2 is not tested because Ref. [46] 
showed that it does not lead to robust reduced-order models; procedure 3 is not tested due to computational 
infeasibility. 

To build the state POD basis, consistent snapshots {w n — w }™^ with n t — 1252 are collected during 
high-dimensional CFD simulation. Then, these snapshots are normalized to prevent snapshots with large 
magnitudes from biasing the SVD. The dimension of the state POD basis is set to n w = 283, which corre- 
sponds to 99.99% of the total statistical energy of the (normalized) snapshots^] All numerical studies carried 
out on the Ahmed body employ this POD basis for the state. 

Algorithm [3] is employed to generate two sample meshes: one using the matrix = $ j generated by 
snapshot-collection procedure 0, and one using $^ . = <f>j generated by snapshot-collection procedure lj^] 
This algorithm employs the following parameters: n c = 219, n s = 378, and an initial sample-node set Af 
seeded with the boundary node whose entries of cj) l R have the largest sum of squares. Figure [7] depicts the 
two resulting sample meshes. Note that Algorithm [3] chooses sample nodes from three salient regions of 
the computational fluid domain: the wake region behind the body, and the region behind each cylindrical 
support. This implies that on average, the magnitude of the residual is highest in these regions during 
the training simulations. This is consistent with the fact that the flow is separated in these regions and is 
characterized by a strong vorticity. 

The two GNAT models employ nj = ur = 1514; this corresponds to 99.99% of the energy in the snapshots 
of the residual collected during the tier II ROM simulation. Both GNAT simulations are executed using only 
4 cores (as compared with 512 cores used for the high-dimensional model). The GNAT simulations employ 
the same Ahmed-body configuration and flow conditions used for the high-dimensional CFD simulation. 

Figure [8] reports the time histories of the drag coefficient predicted by the high-dimensional simulation 
and both GNAT ROM computations. Figure [9] contrasts the surface pressure contours at t = 0.1 s obtained 



3 Numerical experiments reported in Ref. |46| determined this to be an appropriate criterion. 
4 Residual snapshots are normalized before &r is computed. 
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(a) Sample mesh generated using snapshot-collection pro- (b) Sample mesh generated using snapshot-collection pro- 
cedure cedure 1 

Figure 7 Sample meshes with 378 sample nodes generated by Algorithm [3] Sample meshes are shown in red, 
within the computational fluid domain. 



Table 3 Online performance results of GNAT on 4 cores for a sample mesh with 378 sample nodes 



Snapshot-collection 
procedure 


RD 


Average # of Newton 
iterations per time step 


Wall time (hours) 


CR 


WT 





7.43% 


6.47 


7.37 


231 


1.80 


1 


0.68% 


2.75 


3.88 


438 


3.42 



using the high-dimensional model and the GNAT ROM based on snapshot-collection procedure 1. Table [3] 
provides the performance results for the ROM simulations. These results demonstrate the following: 

• Both snapshot-collection procedures and 1 lead to GNAT ROMs that deliver improvement in CPU 
performance (as measured in computational resources CR) exceeding 230. This occurs largely due 
to the drastic reduction in cores made possible by the sample-mesh implementation, which allows 
the ROM simulation to be executed on as few as 4 cores. In particular, the data suggest that 438 
parametric GNAT ROM simulations (using snapshot-collection procedure 1) could be executed in a 
predictive scenario using the same core-hours required by a single high-dimensional CFD computation 
(see Table [3]) — a test that will be conducted in the future. 

• When equipped with snapshot-collection procedure 1, which satisfies one consistency condition, the 
GNAT ROM reproduces almost perfectly the time history of the drag coefficient computed by the 
high-dimensional simulation. On the other hand, GNAT becomes less accurate when equipped with 
snapshot-collection procedure 0, which is inconsistent (see Figure [8]). Furthermore, GNAT requires 
fewer Newton iterations per time step for convergence (and performs faster) when it is equipped 
with snapshot-collection procedure 1 compared with snapshot-collection procedure (see Table |3| . 
These observations highlight the importance of the consistency concept introduced during GNAT's 
development. 

• When equipped with snapshot-collection procedure 1, GNAT delivers pressure-contour results that are 
almost identical to those computed by the high-dimensional simulation, including in the wake region 
behind the body where the flow is most complex (see Figure foj) . 
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High-dimensional model 
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Time (s) 
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Figure 8 Computed time histories of the drag coefficient (GNAT(i) refers to GNAT equipped with snapshot- 
collection procedure i). GNAT(f) directly overlays the high-dimensional model results. 




(a) High-dimensional CFD model 




(b) GNAT ROM equipped with snapshot-collection proce- 
dure 1 (n w = 283, tir = nj = 1514, and sample mesh with 
378 sample nodes) 



Figure 9 Surface-pressure contours at t = 0.1 s 
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Table 4 Sample-mesh attributes 



# of sample nodes 


# of nodes 


# of elements 


Fraction of nodes 
of original CFD mesh 


Fraction of elements 
of original CFD mesh 


253 


12808 


41014 


0.44% 


0.24% 


378 


17096 


56280 


0.59% 


0.33% 


505 


19822 


67082 


0.69% 


0.39% 



6.2.6. Effect of node sampling and interpolation vs. least-squares approximation 

To illustrate the effect of the number of sample nodes on GNAT's performance, this study considers 
three sample meshes: the sample mesh with 378 sample nodes introduced above (constucted using snapshot- 
collection procedure 1), a smaller sample mesh with 253 sample nodes, and a larger one with 505 sample 
nodes. Algorithm [3] is executed to generate these sample meshes; it employs parameters n c — 219 and 
®R — generated by snapshot-collection procedure 1. Table [4] reports the characteristics of these sample 
meshes. The GNAT models for these simulations are equipped with snapshot-collection procedure 1 and 
employ nj = tir = 1514 as in the previous section. Because v = 6, the hyper-reduction associated with 
253 sample nodes corresponds roughly to interpolation of the residual and its Jacobian. Indeed, the sample- 
index factor in this case is n — (253 x 6)/1514 ps 1. For the case of 378 sample nodes, n = 1.5; the sample 
mesh with 505 sample nodes is characterized by r\ — 2.0. These latter two cases correspond to least-squares 
approximation of the residual and its Jacobian. 

Figure [10] reports the time histories of the drag coefficient obtained using the high-dimensional model 
and the GNAT ROMs based on these three sample meshes. Table [5] provides the performance results for the 
ROM simulations obtained using 4 cores. These results indicate the following: 

• In all cases, GNAT reproduces the time history of the drag coefficient computed using the high- 
dimensional model with less than 1% discrepancy. 

• As sample nodes are added, the convergence of Newton's method at each time step improves on average. 

• The fastest performance of GNAT is obtained for the smallest sample mesh. 

• Interpolation of the residual and its Jacobian (253 sample nodes) does not lead to the best convergence 
of the Newton solver or the most accurate results. However, it does lead to the best overall CPU 
performance of GNAT in this case. 



Table 5 Online performance on 4 cores of GNAT equipped with snapshot-collection procedure 1 for various 
sample meshes 



# of sample nodes 


V 


RD 


Average of Newton 
iterations per time step 


Wall time (hours) 


CR 


WT 


253 


w 1 


0.79% 


4.38 


3.77 


452 


3.52 


378 


1.5 


0.68% 


2.75 


3.88 


438 


3.42 


505 


2.0 


0.75% 


2.25 


4.22 


403 


3.15 



6.2.7. Parallel scalability 

Due to the sample mesh concept, GNAT is parallelized in the same manner as a typical CFD code 
is, using mesh partitioning. However, because GNAT operates on a dramatically smaller mesh, its parallel 
performance cannot be expected to scale in the strong sense — that is, for a fixed ROM size and an increasing 
number of processors. This is also true for the online stage of any other model-reduction method. 

To obtain an idea of the strong scaling that can be expected from a nonlinear model-reduction method, 
Table [6] reports the CPU performance results obtained for GNAT equipped with snapshot-collection proce- 
dure 1, the sample mesh with 378 sample nodes, and nj = jir = 1514. Excellent speedups are obtained 
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Figure 10 Computed time histories of the drag coefficient for different numbers of sample nodes (GNAT(f) 
refers to GNAT equipped with snapshot-collection procedure 1) 

Table 6 Assessment of GNAT's strong scaling performance for a sample mesh with 378 sample nodes 



# of cores 


Wall time (hours) 


Speedup 


WT 


CR 


1 


16.1 


1.0 


0.83 


422 


2 


8.74 


1.84 


1.52 


388 


4 


3.88 


4.14* 


3.43 


438 


8 


2.50 


6.44 


5.32 


340 


12 


1.94 


8.25 


6.86 


292 


16 


2.08 


7.74 


6.39 


204 



*This superlinear speedup is likely due to caching and other memory management effects. 



for a number of cores varying between 2 and 8, a good speedup is obtained for 12 cores, and a reasonable 
one is obtained for 16 cores. For a larger number of cores, the parallel efficiency (defined as the ratio of the 
speedup to the number of cores) increasingly deteriorates. This is not surprising given that the GNAT ROM 
operates on a mesh with only 378 sample nodes. 

6.2.8. Performance comparison with other function- sampling ROM methods 

To conclude this section, the performance of GNAT equipped with snapshot-collection procedure 1 and 
n J = n R = 1514 is compared to that of other hyper-reduction techniques based on function sampling. This 
study employs the same wake flow problem, the same state POD basis of dimension n w = 283, and same 
sample mesh with 378 sample nodes. The following function-sampling techniques are compared with GNAT: 

1. A collocation of the nonlinear equations followed by a Galerkin projection of the resulting over- 
determined system of 2268 nonlinear equations (378 sample nodes x 6 equations per node) with 
n w = 283 unknowns [291120]. 

2. A collocation followed by a least-squares solution of the resulting over-determined system [2"5] . 

3. A discrete empirical interpolation method (DEIM)-like [SI] approach that employs snapshot-collection 
procedure and tir = nj = 2268 so that the residual and Jacobian functions are approximated by 
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interpolation. The tested approach employs the tier II Petrov-Galerkin solution of the overdetermined 
equations as opposed to the Galerkin projection; this is done to isolate the effect of the hyper-reduction 
technique on performance. 



Figure 11 reports the time histories of the drag-coefficient computed using all hyper-reduction techniques 
outlined above. Both collocation approaches lead to nonlinear instabilities after a few time steps of the 
flow simulation, thereby exposing the weakness of collocation for highly nonlinear problems. The DEIM-like 
approach, which employs the popular but inconsistent snapshot-collection procedure 0, also performs poorly. 
For this approach, the Newton iterations begin to generate zero search directions after only a few time steps 
of the flow simulation. 
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0.27 
0.26 
0.25 
0.24 
0.23 
0.22 



■ High-dimensional model 

■ GNAT(l) 

■ Collocation + Galerkin projection 

■ Collocation + Least-squares 

■ DEIM-like 




0.04 0.06 
Time (s) 
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Figure 11 Computed time histories of the drag coefficient (GNAT(l) refers to GNAT equipped with snapshot- 
collection procedure 1) 



7. Conclusions 

In this work, the Gauss-Newton with approximated tensors (GNAT) nonlinear model-reduction method is 
equipped with a sample-mesh concept that eases the implementation of its online stage on parallel computing 
platforms. This work also develops global state-space error bounds that justify GNAT's design, characterize 
its mathematical properties, and highlight its advantages in terms of minimizing components of these bounds. 
The effectiveness of GNAT on parametric problems and its robustness for highly nonlinear computational 
fluid dynamics (CFD) applications characterized by moving shocks is demonstrated by the solution of a 
conservation problem described by the inviscid Burgers' equation with a variable source term and boundary 
condition. GNAT's ability to reduce by orders of magnitude the core- hours required to compute turbulent 
viscous flows at high Reynolds numbers, while preserving accuracy, is demonstrated with the simulation of 
the flow field in the wake of the Ahmed body. For this popular benchmark problem with over 17 million 
unknowns, GNAT is found to outperform several other nonlinear model reduction methods, reduce the 
required computational resources by more than two orders of magnitude, and deliver a solution with less 
than 1% discrepancy compared to its high-dimensional counterpart. 
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Appendix A. Proof of consistent snapshots for the state POD basis 

The following proposition proves that two options for collecting state snapshots lead to a consistent 
projection. For the sake of simplicity, one set of training inputs ^ tram is considered and therefore 2?train — 
retrain j_ R eca ll the reduced-order- model solution is defined by Eq. ^ as 

w n (y) = w°(fi) + $ w w?(n), n = 0,...,n t . (A.l) 

Proposition Appendix A.l. Consistency of the state snapshots. Assume the following: 

1. The set of snapshots Wi or W2 is used to compute 3> w via POD, where 

W 1 = {u/V rain ) " 10 V™") \ n = l,...,n t } (A.2) 
W 2 = { W "( M train ) - u/ 1 " V™") I n = 1, . . . , n t }. (A.3) 

2. The Gauss-Newton method is employed to compute solutions u>™ +1 (/i) to the nonlinear least-squares prob- 
lem 

< +1 0) = arg min f n (y;n), (A.4) 
for n = 0, . . . , n t — 1 and any [i E T>. Here, 

P(y; M ) = i||ir>°G«) + ^ (A.5) 

The residual R n arising from the sequence of reduced- order-model solutions is related to the residual R n 
arising from the sequence of high-dimensional-model solutions as follows: 

R n (w; fi) = S n (w,ti n , . . . ,w 1 ,w°; fi) (A.6) 
S n (w,w n ,...,w\w°;fi) = R n (w,fi), (A.7) 

for n = 0, . . . , n± — 1 and any fi G T>. Here, S n explicitly reflects the dependence of the residual on the 
state at previous time steps. 

3. The reduced-order model employs the same initial condition as the high- dimensional model: 

w°(p) = 0. (A.8) 

4-. The standard assumptions (see Theorem 10.1 \4% ) needed for the convergence of the Gauss-Newton 
iterations to a stationary point contained in the level set C n (/J,): 

e M n (p), (A.9) 

for n = 0, . . . , rit — 1 and any /iGP. Here, define 

ee {y I < /" (w^^)} (A.10) 

M n ( M ) = {y\ye £ n ( M ), V/ n (y; /*) = o}. (All) 

5. The level set C n (/i) contains only one stationary point: \M n (fi)\ — 1 for n = 0, . . . , n t — 1 and any fi £ T>. 

Then, the projection approximation is consistent in the sense that the Petrov-Galerkin ROM (i.e., GNAT 
without hyper-reduction) associated with a POD basis <5> w that is not truncated computes the same states as 
the original high- dimensional CFD model for the training inputs — that is, 

~n( train\ n [ trainx „ n IK i <~i\ 

w {/j, )=w (n ), n = 0, . . . ,n t . ( A -12) 
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Proof: Consider computing solutions w™(// rain ), n — 0, . . . ,n t under the stated assumptions. In the sequel, 
the argument ^ tram is dropped for notational simplicity. The result, i.e., Eq. (A. 12), is proven by induction. 
It is true for n = due to Assumption [3] Assume now that w 1 = w l , i — 0, . . . , n. 

Assumption [2] ensures that the Gauss-Newton method is used to compute the solution w" +1 . Assumption 
[4] guarantees that these Gauss-Newton iterations will converge to a local stationary point in the level set 
C n . Therefore, 

< +1 g M n . (A.13) 



Assumption [T] ensures that 



w 



n+1 - w° G range ($„ 



(A.14) 



To see this, first consider the case where Wi is used to compute & w . Then, u>" +1 (/i train )— iti°(/j tmm ) G Wi and 
therefore iu n+1 (^ train ) -u; (^ train ) G span(Wi). If the POD basis $> w is not truncated, then range (& w ) = 
span(Wi) and Eq. ( |A.14 ) holds. Now, consider the case where W2 is used for computing Q w . Because 



yi+i^train) _ ^i^tram) e W 2 , £ = 0, . . . , ra t , then w n+1 (/x train ) - w°(^ tra in ) G span (W 2 ). If the POD basis 
& w is not truncated, then range (& w ) = span(W 2 ), and again Eq. (A.14) holds. 
The induction assumption (w* — w l ,i 



0, . . . , n) and Eq. (A.14) ensure that 



$1 (w n+l - w°) G M' 



This can be derived by setting y — $^ (w" +1 



w°) and writing the objective function: 
1, 



0. 



w 



°))ll 



;\\R n ^ n+1 )\\ 2 2 
;\\R n (™ n+1 )\\l 



(A.15) 

(A.16) 

(A.17) 

(A.18) 
(A.19) 



Eq. (A.17) is due to Eq. (A.14) and the orthogonality of the POD basis. Eq. (A.18) arises from the equalities 
R n ( w ) ^ S n {w,w n ,...,w\w°) = S n (w,w n ,...,w 1 ,w°) = R n (w), (A.20) 



which hold due to the induction assumption. Finally, Eq. (A.19) holds because the full-order solution leads 



to a zero residual: R n (w n+1 ) = 0. Because f n (y) > Vy, Eq. (A.19) implies that $J (w n+1 — w°) is a local 
minimizer of /", so Eq. (A.15) holds. 

Assumption [5J Eq. (A.13), and Eq. (A.15) together imply 



w? +1 = (w n+1 



Substituting Eq. (A.21I into Eq. (A.l) yields 



w n+1 =w Q + (w n+1 - w°) 



Eq. (A. 22) along with Eq. (A.14) and the orthogonality of the POD basis provides the result: 



w"+V rain ) = w" + V rain ), n = 0,.. . ,n t . 



(A.21) 



(A.22) 



(A.23) 



□ 
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Appendix B. Further discussion of the various snapshot-collection procedures 

d[BZR(w (0) + <f> w y) 



If $j = then A = B and BZJ^<$> V 



dy 



As a result, the GNAT iterations 



(15|-(16) are in this case equivalent to the Gauss Newton iterations for solving 



minimize \\BZR{w)\\2- 



(B.l) 



Because ||BZi?|| 2 = \\<&rBZR\\ 2 when = / and the gappy POD approximation of R is R — QrBZR, 

the GNAT iterations are also equivalent to the Gauss-Newton iterations for solving 



minimize ||i2(iu)||2. 



(B.2) 



Therefore, when $j — $^ and has orthonormal columns, GNAT inherits the convergence properties 
of the Gauss-Newton method. This is the rationale behind both procedure and procedure 1 outlined in 
Section [3X21 

On the other hand, procedure 2 and procedure 3 use different bases and <!>/. For this reason, the GNAT 



iterations ( 15 ) ( 16 ) cannot be associated with Gauss-Newton iterations for nonlinear residual minimization. 



Furthermore, choosing $j ^ causes the least-squares problem (15) to try to 'match' quantities that lie 



in different subspaces. For these reasons, procedure 2 and procedure 3 may lack robustness and experience 
convergence difficulties as reported in Ref. [46j . 



Appendix C. Error bounds for the solution computed by a discrete nonlinear model reduction 
method 



This section proves the error bound (22 ) presented in Section [4] For the sake of notational simplicity, the 
derivation presented here considers the approximation error arising from a given set of inputs and therefore 



omits fj, from the arguments of the nonlinear functions. Rewriting the residual (19) in this fashion leads to 

R n (w n+1 ) = w n+1 - w n - AtF(w"+\t n+1 ). (C.l) 

Similarly, the residual at the the n-th time step arising from any sequence of approximate solutions w n , 
n = 0, . . . , nt, e.g., generated by a discrete nonlinear ROM, for the same input parameters can be written as 



R n {w n+L ) = w n+L -w n - AtF{w n+L ,t n+L ) 



Subtracting ( |C.2| ) from (jCTj) yields 
R n {w n+1 ) - R n (w n+1 ) = w n+1 - w n - AtF(w n+1 ,t n+1 ) - w n+1 + w n + AtF(w n+1 , t n+1 ). 
The above expression can be re-arranged as 



„n+l „7,™+l 



AtF{w n+i , t n+i ) + AtF(w n+ \t n+i ) = R n {w n+i ) - R n {w n+i ) +w n - w n . 



Introducing / : (x, t) i-> x — AtF(x, t) and the inverse Lipschitz constamj^] 

IN -3/11 



allows Eq. (C.4) to be transformed into the following bound on the local approximation error: 

- X || < Ob (eNcwton + \\R n (w n+1 )\\ + |K - tS"||' 



(C.2) 
(C.3) 
(C.4) 

(C.5) 
(C.6) 



5 Note that e = -Ar in Eq. d20l 
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Assuming that the initial approximation error is zerc^] (w° = w°), the inequality (C.6) leads to the 
following result 



< a k b n -k, 



fc=i 



where a — Lq = sup,^^ n i Cq and 



6n = eNcwton + ||i? n (*" +i )|| 



(C.7) 



(C.8) 



From the triangle inequality, it follows that \\R n {w nJrl )\\ < ||P,R"(w; n+1 )|| + || (J — P) R n (w n+1 )\\ for any 
P. Hence, another bound for the approximation error is 



,,n ~n 



w n \\ <^a k c, 



n—k : 



k = l 



where 



On = Newton + \\PR n (w n+ ')\\ + ||(I- P)R n (w n+i )\ 



(C.9) 



(CIO) 



and c n > b n . The bound (C.9) is particularly interesting for the case where P — [ZQr] Z represents 
the gappy POD operator because ||P.R n (u> ,l+1 )|| = || [Z$ R ] + ZR n (w n+1 )\\ is readily computable by GNAT. 
In |Appendix D| it is shown that an upper bound for the gappy POD approximation error is 



(I - P)R n (w n+1 )\\ < HR^IHK/ 



n {w n+1 )\\ 



(Oil) 



where P = ^.r^ defines the orthogonal projector onto range ($>r), and ZQr = QR is the thin QR factor- 



ization of Z^ji with Q G 



and R G 



bound for the approximation error is 



Therefore from (Oil), it follows that yet another error 



\w n -W n \\ <J2 ak dn-k, 



k=l 



where 



d„ = ENcwton + \\PR n (w r 



IR- 1 !!!! (j-p) R n ( 



(C.12) 



(C.13) 



Because b n < c n < d n , it follows that a global bound for the approximation error at the n-th time step with 
1 < n < n t is given by 



< akb n-k < a k c n ^ k < Y a k d n -k- 



fe=i 



fc=i 



fc=i 



(C.14) 



Appendix D. Error bound for the gappy POD approximation 

This section establishes a bound for the error associated with the gappy POD approximation of a vector 
g G K w using a POD basis $/ G ~R Nxn f and a set of > n,f sample indices I that define the sample matrix 



Z (see Section 3.4.1 for these definitions)^] 

Define g* = Fg with P = Qf&J as the orthogonal (i.e., optimal) projection of g onto range ($/). Also, 
define the difference between g and its orthogonal projection onto range (<&/) as e = g — g* . Finally, define 
the gappy POD projection matrix P = (f> g R _1 Q T Z, where Z$f = QR is the thin QR factorization of Z&f 
with Q G R n * xn J and R G R n f xn f. 



6 This is valid for both the Petrov-Galerkin and GNAT ROMs as they employ the same initial condition as the high- 
dimensional model (See Algorithm [ill. 

7 This development follows closely the proof of Lemma 3.2 in Ref. 131 1 ■ 
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The gappy POD approximation of g is Pg = P (e + g* ) . It can also be written as 

Pg = Pe + g* 



Because II J — P|| 



I-PH2 for any projection matrix P not equal to or J, it follows that 
\\I-Ph = \\Ph = ||$ a R- 1 Q r Z|| a = llR-^la. 



(D.l) 



because Pg* = g*, as g* G range($/). Substituting g* = g — e into Eq. (D.l) yields (I - P)g = (I — P)e. 
Therefore 

||(/-P)5||2 = ||(/-P)e|| 2 <||(J-P)|| 2 ||e|| 2 . (D.2) 



(D.3) 



The last equality follows from the fact that <& ff , Z , and Q have orthonormal columns. Substituting (D.3) 



in (D.2) gives the result 



||(J-P) ff || 2 <||R- 



(D.4) 
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